-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH/BUG: Use Kleene logic for groupby any/all #40819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mzeitlin11 Thanks a lot for working on this!
pandas/core/groupby/groupby.py
Outdated
from pandas.core.arrays.boolean import BooleanArray | ||
from pandas.core.arrays.masked import BaseMaskedArray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can import both from from pandas.core.arrays import ..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
pandas/core/groupby/groupby.py
Outdated
@@ -1435,6 +1446,7 @@ def result_to_bool(result: np.ndarray, inference: type) -> np.ndarray: | |||
post_processing=result_to_bool, | |||
val_test=val_test, | |||
skipna=skipna, | |||
masked=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this set to always be False?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had originally used masked
being included in kwargs as a signal for whether to pass masked
into the groupby func. But the original value here wasn't relevant since it gets set before being used.
This is definitely confusing and a poor solution, changed to instead use a positional arg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, the update looks clearer to follow now
pandas/tests/groupby/test_any_all.py
Outdated
@pytest.mark.parametrize("bool_agg_func", ["any", "all"]) | ||
@pytest.mark.parametrize("skipna", [True, False]) | ||
@pytest.mark.parametrize( | ||
# expected indexed as [skipna][bool_agg_func == "all"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you maybe also write out what that results in practice? So [skipna=False/any, skipna=False / all], [skipna=True / any, skipna=True / all], if I understand correctly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep have changed, that is a much clearer way to describe it
Hello @mzeitlin11! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2021-04-13 15:02:14 UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! (and much cleaner now ;))
I only added a few tiny stylistic/doc suggestions you can apply.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls merge master as well
pandas/tests/groupby/test_any_all.py
Outdated
|
||
# The expected result we compared to should match aggregating on the whole | ||
# series | ||
result = getattr(df[0], bool_agg_func)(skipna=skipna) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this check is really hard to parse.
what exactly are you checking? can you make this much simpler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What this is essentially doing is series.any(skipna=skipna)
, but then with getattr
because it's generic for any/all. Do you have a concrete suggestion to change? (or to clarify the comment?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a complicated expexcted then a really complicated assert. This needs to be greatly simplified. It is impossible to grok with all of this logic. my suggesetion is to break this up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be very concrete, you mean breaking it up into 2 lines, like
series = df[0]
getattr(series, method)(skipna=skipna)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have tried to simplify by moving this validation of "expected" into a separate test in test_reductions.py
(replacing the existing test there as this tests a superset of the logic).
pandas/tests/groupby/test_any_all.py
Outdated
def test_masked_kleene_logic(bool_agg_func, data, expected_data, skipna): | ||
# GH#37506 | ||
df = DataFrame(data, dtype="boolean") | ||
expected = DataFrame( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this expected is really hard to parse can you do in multiple steps / make simpler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressing your comment above makes this simpler, let me know if you think any piece is still confusing
Co-authored-by: Joris Van den Bossche <[email protected]>
Co-authored-by: Joris Van den Bossche <[email protected]>
Co-authored-by: Joris Van den Bossche <[email protected]>
Co-authored-by: Joris Van den Bossche <[email protected]>
Co-authored-by: Joris Van den Bossche <[email protected]>
…to enh/any_all_kleene
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok looks good. question on the rendering in the notes, ping on green.
Greenish, failures unrelated and fixed by #40930 and flaky zip fix that just got merged |
thanks @mzeitlin11 very nice! |
Built on #40811
ASV's don't look affected: